Programmable logic device with integrated network-on-chip

ABSTRACT

Systems and methods for providing a Network-On-Chip (NoC) structure on an integrated circuit for high-speed data passing. In some aspects, the NoC structure includes multiple NoC stations with a hard-IP interface having a bidirectional connection to local components of the integrated circuit. In some aspects, the NoC stations have a soft-IP interface that supports the hard-IP interface of the NoC station.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/298,122, filed on Oct. 19, 2016, which is a continuation of U.S.application Ser. No. 14/066,425, filed Oct. 29, 2013, the contents ofwhich is incorporated by reference in its entirety, which claims thebenefit of U.S. Provisional Application No. 61/721,844, filed Nov. 2,2012, the contents of which are incorporated by reference in theirentirety.

BACKGROUND OF THE DISCLOSURE

Existing integrated circuits such as programmable logic devices (PLDs)typically utilize “point-to-point” routing, meaning that a path betweena source signal generator and one or more destinations is generallyfixed at compile time. For example, a typical implementation of anA-to-B connection in a PLD involves connecting logic areas through aninterconnect stack of pre-defined horizontal wires. These horizontalwires have a fixed length, are arranged into bundles, and are typicallyreserved for that A-to-B connection for the entire operation of thePLD's configuration bitstream. Even where a user is able to subsequentlychange some features of the point-to-point routing, e.g., throughpartial recompilation, such changes generally apply to block-levelreplacements, and not to cycle-by-cycle routing implementations.

Such existing routing methods may render the device inefficient, e.g.,when the routing is not used every cycle. A first form of inefficiencyoccurs because of inefficient wire use. In a first example, when anA-to-B connection is rarely used (for example, if the signal valuegenerated by the source logic area at A rarely changes or thedestination logic area at B is rarely programmed to be affected by theresult), then the conductors used to implement the A-to-B connection mayunnecessarily take up metal, power, and/or logic resources. In a secondexample, when a multiplexed bus having N inputs is implemented in apoint-to-point fashion, metal resources may be wasted on routing datafrom each of the N possible input wires because the multiplexed bus, bydefinition, outputs only one of the N input wires and ignores the otherN−1 input wires. Power resources may also be wasted in these exampleswhen spent in connection with data changes that do not affect a latercomputation. A more general form of this inefficient wire use occurswhen more than one producer generates data that is serialized through asingle consumer, or the symmetric case where one producer produces datathat is used in a round-robin fashion by a two or more consumers.

A second form of inefficiency, called slack-based inefficiency, occurswhen a wire is used, but below its full potential, e.g., in terms ofdelay. For example, if the data between a producer and a consumer isrequired to be transmitted every 300 ps, and the conductor between themis capable of transmitting the data in a faster, 100 ps timescale, thenthe 200 ps of slack time in which the conductor is idle is a form ofinefficiency or wasted bandwidth. These two forms of wireunderutilization, e.g., inefficient wire use and slack-basedinefficiency, can occur separately or together, leading to inefficientuse of resources, and wasting valuable wiring, power, and programmablemultiplexing resources.

In many cases, the high-level description of the logic implemented on aPLD may already imply sharing of resources, such as sharing access to anexternal memory or a high-speed transceiver. To do this, it is common tosynthesize higher-level structures representing busses onto PLDs. In oneexample, a software tool may generate an industry-defined bus asRegister-Transfer-Level (RTL)/Verilog logic, which is then synthesizedinto an FPGA device. In this case, however, that shared bus structure isstill implemented in the manner discussed above, meaning that it isactually converted into point-to-point static routing. Even in a schemeinvolving time-multiplexing of FPGA wires, such as the one proposed onpages 22-28 of Trimberger et. al. “A Time Multiplexed FPGA”, Int'lSymposium on FPGAs, 1997, routing is still limited to an individual-wirebasis and does not offer grouping capabilities.

SUMMARY OF THE INVENTION

This disclosure relates to integrated circuit devices, and,particularly, to such devices having a programmable fabric and acommunication network integrated with the programmable fabric forhigh-speed data passing.

In some aspects, a programmable integrated circuit includes a pluralityof Network-On-Chip (NoC) stations, each NoC station in the plurality ofNoC stations configured to receive a clock input and having a hard-IPinterface. The hard-IP interface includes a bidirectional connection toa local logic area of the programmable integrated circuit, and aplurality of bidirectional connections to a respective plurality ofneighbor NoC stations of the programmable integrated circuit.

In some aspects, a method is provided for configuring auser-programmable soft-IP interface for a NoC station of an integratedcircuit, the soft-IP interface supporting a hard-IP interface of the NoCstation. The soft-IP interface is instantiated, via a software libraryfunction. At least one Quality-of-Service (QoS) parameter of the NoCstation is specified for the soft-IP interface via software. The soft-IPinterface is configured based on the at least one QoS parameter toprovide functionality for the NoC station not otherwise provided by thehard-IP interface.

In some aspects, an integrated circuit includes a plurality of NoCstations, each NoC station in the plurality of NoC stations includingclock circuitry configured to receive a clock input; and auser-programmable soft-IP interface for configuring logic supporting thehard-IP interface. The user-programmable soft-IP interface includes QoScircuitry configured to manage at least one QoS-related metric for datatraversing at least one connection of the NoC station.

In some aspects, a programmable logic device (PLD) includes a pluralityof NOC stations, each NOC station configured to receive a clock inputand comprising a hard-IP interface and a user-programmable soft-IPinterface for configuring logic supporting the hard-IP interface. Thehard-IP interface includes a bidirectional connection to a local logicarea of the PLD and a plurality of bidirectional connections to arespective plurality of neighbor NOC stations of the programmable logicdevice. The user-programmable soft-IP interface includes QoS circuitryconfigured to manage at least one QoS-related metric for data traversingat least one connection of the NOC station.

In some aspects. A NoC interface includes bus-oriented hard-IP interfacecircuitry configured to provide data transfer on a standardizedconnection; bus-oriented soft-IP interface circuitry configured toreceive data from the hard-IP interface circuitry on the standardizedconnection and provide additional data management functionality notprovided for by the hard-IP interface, where the soft-IP interface isuser customizable; and bus circuitry configured to transfer data betweenthe soft-IP interface circuitry and a bus-oriented external logic block.

BRIEF DESCRIPTION OF THE DRAWINGS

Further features of the invention, its nature and various advantageswill be apparent upon consideration of the following detaileddescription, taken in conjunction with the accompanying drawings, inwhich like reference characters refer to like parts throughout, and inwhich:

FIG. 1 depicts an illustrative floorplan of an FPGA in accordance withan implementation;

FIG. 2 depicts an illustrative mesh-based NoC routing structure for anFPGA in accordance with an implementation;

FIG. 3 depicts an illustrative unidirectional ring-based NoC routingstructure for an FPGA in accordance with an implementation;

FIG. 4 depicts an illustrative bidirectional ring-based NoC routingstructure for an FPGA in accordance with an implementation;

FIG. 5 depicts an illustrative asymmetric NoC routing structure for anFPGA in accordance with an implementation;

FIG. 6 depicts an illustrative static NoC routing structure for an FPGAin accordance with an implementation;

FIG. 7 depicts an illustrative time-shared NoC routing structure for anFPGA in accordance with an implementation;

FIG. 8 depicts an illustrative NoC routing structure based on data tagsfor an FPGA in accordance with an implementation;

FIG. 9 depicts a schematic diagram of functionality associated with aNoC station in accordance with an implementation;

FIG. 10 illustrates a MegaFunction for implementing a NoC station withparameterizable network operation according to an implementation;

FIG. 11 illustrates a MegaFunction with such soft-logic interfacefunctionality, implementing a NoC station according to animplementation;

FIG. 12 depicts an illustrative MegaFunction with embedded memoryresources, implementing a NoC station according to an implementation;

FIG. 13 illustrates a manner in which NoC stations may be placed in anFPGA device with a vertically tiled organization according to animplementation;

FIG. 14 depicts several illustrative family variants in which NoCcomponents scale to different device sizes in accordance with someimplementations;

FIG. 15 depicts an illustrative floorplan of an FPGA with a NoCarbitration mechanism according to an implementation; and

FIG. 16 is a flowchart illustrating a process for configuring auser-programmable soft-IP interface for a NoC station in accordance withsome implementations.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 depicts an illustrative floorplan 100 of an FPGA in accordancewith an implementation. The floorplan 100 depicts various illustrativeblocks of an FPGA. The floorplan 100 includes core logic fabric 110,which may have configurable logic blocks, look-up tables (LUTs), and/orD flip-flops (DFFs) (not explicitly shown in FIG. 1). The floorplan 100includes memory blocks 112 and memory block 116. The memory blocks 112may each be of a different bit size than the memory blocks 116. Forexample, in one arrangement, each of the memory blocks 112 is a 512-bitmemory block, while each of the memory blocks 116 is a 4,096-bit memoryblock. The floorplan 100 includes variable-precision digital signalprocessing (DSP) blocks 114. In some arrangements, each DSP block of theDSP blocks 114 includes a number of multipliers, adders, subtractors,accumulators, and/or pipeline registers.

The floorplan 100 includes phase lock loops (PLLs) 120 and generalpurpose input-output (I/O) interfaces 122. The I/O interfaces 122 may beimplemented in soft-IP and may interface with, e.g., external memory.The floorplan 100 includes hard-IP input-output (I/O) interfaces 124.The hard-IP I/O interfaces 124 may include one or more physical codingsublayer (PCS) interfaces. For example, the hard-IP I/O interfaces 124may include 10 G Ethernet interfaces. Not shown in the floorplan 100,but implied in the core logic fabric 110, is a network of routing wiresand programmable switches. The network may be configured by SRAM bits,though other means are also possible, to implement routing connectionsbetween blocks of the floorplan 100.

It is common in an FPGA and other programmable logic devices toimplement bandwidth resources as multiple paths in the style of thepoint-to-point routing schemes discussed above. But such implementationscan lead to inefficiency, e.g., because of underutilization of wires. Toaddress this, some embodiments discussed herein increase efficiency byimplementing a network which more efficiently uses the wiring andprogrammable multiplexing resources, for example, by sharing suchresources with a common transmission wire and multiple accesses ontothat wire.

Presented next are a series of alternative network on a chip (NoC)routing structures, each of which may be implemented in addition to theexisting static routing resources on an FPGA. The disclosed NoC routingstructures allow expensive connections in a floorplan (such as floorplan100 of FIG. 1) to utilize shared routing resources and, thus, moreefficiently make use of metal and silicon resources in an FPGA (or otherprogrammable devices). Conceptually, some of the disclosed NoC routingstructures can be thought of as lying over an existing FPGA routingfabric similar to a “highway” for carrying data throughout the FPGA.

For example, FIG. 2 depicts an illustrative mesh-based NoC routingstructure for an FPGA in accordance with an implementation. Floorplan200 is identical to the floorplan 100, but includes NoC stations 202,204, 206, 208, 210, 212, 214, 216, 218, 220, 222, and 224, and wiresinterconnecting those NoC stations. Each of these wires is abidirectional wire. The floorplan 200 illustrates a case of twelve NoCstations. Each of these NoC stations may be a source point anddestination point in the NoC interconnect or a landing point for a datatransfer. The wires connecting the NoC stations may be preferentiallymulti-bit connections. For example, in one implementation, each wire ofthe NoC interconnect is 64-bits wide. In another implementation, eachwire of the NoC interconnect is 71-bits wide, with 7 bits dedicated toout-of-band control signals.

The logic separation of the NoC interconnect (including the NoC stationsand their wires) from the traditional fabric of the floorplan 200, asdepicted in FIG. 2, may allow for electrical optimization particular tothe characteristics and use model of the NoC interconnect. For example,a type of bussed wires, pipeline, a width, and/or spacing of NoCstations may be optimized. Further, as would be understood by one ofordinary skill, based on the disclosure and teachings herein, each ofthe NoC stations depicted in FIG. 2 may alternatively be represented asa general I/O pad or as an on/off direct connection.

The mesh-based NoC structure illustrated in FIG. 2 is merely onetopology in which NoC stations may be implemented on an a structure suchas an FPGA floorplan; other topologies may be used. Various aspects ofthe topology may be modified without departing from the scope of thisdisclosure, such as, but not limited to, directionality aspects of thetopology, symmetry aspects, and other configurations aspects includingtime-sharing, multicast/broadcast, and/or any other aspect. Examples ofthese topologies are illustrated in FIGS. 3-8 below.

FIG. 3 depicts an illustrative unidirectional ring-based NoC routingstructure for an FPGA in accordance with an implementation. Floorplan300 is identical to the floorplan 100, but includes NoC stations 302,304, 306, 308, 310, 312, 314, 316, 318, and 320, and wiresinterconnecting those NoC stations. Further, data traverses from one NoCstation to another in a unidirectional, clockwise manner as indicated bythe arrows in FIG. 3.

FIG. 4 depicts an illustrative bidirectional ring-based NoC routingstructure for an FPGA in accordance with an implementation. Floorplan400 is identical to the floorplan 100, but includes NoC stations 402,404, 406, 408, 410, 412, 414, 416, 418, and 420, and wiresinterconnecting those NoC stations. Further, data may traverse from oneNoC station to another in either a clockwise or counterclockwise manneras indicated by the directional arrows in FIG. 4.

FIG. 5 depicts an illustrative asymmetric NoC routing structure for anFPGA in accordance with an implementation. Floorplan 500 is identical tothe floorplan 100, but includes NoC stations 502, 504, 506, 508, 510,512, 514, 516, and 518, and wires interconnecting those NoC stations. Asdepicted in FIG. 5, the topology of NoC stations is verticallyasymmetric and, in particular, NoC station 516 is associated with onlytwo wires (rather than a 4-way cross point of wired connections such asthe one associated with NoC stations 502, 504, 506, 508, 510, 512, 514,and 518).

In certain implementations, data transferred on the network isstatically configured so that each NoC station receives data from atmost one other NoC station and outputs data to at most one other NoCstation. An advantage of this technique is that each NoC station mayoperate according to a common clock without creating bottleneckthroughput delays in the NoC topology. For example, FIG. 6 depicts anillustrative static NoC routing structure for an FPGA in accordance withan implementation. Floorplan 600 is identical to the floorplan 100(certain elements of the core logic fabric are omitted for the purposesof illustration in FIG. 6), but includes NoC stations 602, 610, 612,614, 616, and 624, and wires interconnecting those NoC stations.

As depicted by wire path 630 of FIG. 6, the NOC station 610 receivesdata from the NoC station 602 (and from no other NoC station) andprovides data to the NoC station 612 (and to no other NoC station).Similarly, as depicted by wire path 640 of FIG. 6, the NOC station 616receives data from the NoC station 614 (and from no other NoC station)and provides data to the NoC station 624 (and to no other NoC station).In some implementations, the network is pipelined and the wires of theNoC topology of the network are clocked at a higher rate than fabricstitched connections of the network. For example, with reference to FIG.6, the fabric stitched connections of the network may operate at a clockof 400 MHZ, while each of the NoC stations (i.e., including NoC stations602, 610, 612, 614, 616, and 624) operates at a clock of 1 GHz. Thus, inthe case that each wire connecting NoC stations is 64-bit wide, a totalthroughput of 64 GHz would be possible.

In certain implementations, NoC stations of the network are arranged tooperate in a shared manner, e.g., in a time-shared (or time-multiplexed)manner, a frequency-shared manner, or any suitable manner. For example,FIG. 7 depicts an illustrative time-shared NoC routing structure for anFPGA in accordance with an implementation. In FIG. 7, NoC stations 702and 714 each forward data to NoC station 712. The NoC station 712collects the aggregate data provided by the NoC stations 702 and 714using any suitable time-shared scheme. For example, the NoC station 712may collect data using a round-robin scheme in which data is collectedfrom a buffer of NoC station 710 for a first time interval, from abuffer of NoC station 714 during a second time interval, and then theround-robin scheme repeats. Further, the NoC station 712 could transferthis aggregated data into a local memory buffer or some otherappropriate capture mechanism. The logic circuitry supporting the NoCstation 712 may contain configuration data specifying the appropriateclock for the station and/or a time-shared/time-sliced mechanism foraccepting data from the two sources (e.g., NoC stations 702 and 714).

In some implementations, data is appended with tags identifying whetherthe data is to be consumed, observed, and/or processed by a given NoCstation. For example, FIG. 8 depicts an illustrative NoC routingstructure based on data tags for an FPGA in accordance with animplementation. Floorplan 800 is identical to the floorplan 100, butincludes NoC stations 802, 804, 806, 808, and 810, and wiresinterconnecting those NoC stations. In one implementation, data isgenerated at a location A of core logic fabric 830 and destined for alocation B of the core logic fabric 830. This data traverses NoCstations 802, 804, 806, 808, and 810. In particular, a packet of datagenerated at A may be appended with information identifying NoC station810 as an intended destination NoC station.

The packet would then be forwarded from the NoC station 802 to the NoCstation 810 according to any specified protocol (e.g., a broadcast ormulticast protocol). For example, according to an illustrative broadcastprotocol, the packet may be transferred across NoC stations in thefollowing sequence: NoC station 802, NoC station 804, NoC station 806,NoC station 808, and NoC station 810. Each of these stations inspectsthe packet to see if the station is specified as the intendeddestination in the appended information of the packet.

In the present example, only NoC station 810 is specified as theintended destination of the packet. Thus, each of NoC stations 804, 806,and 808 receives the packet, determines not to process the packet, andforwards the packet onto a next NoC station. The next NoC station may bedetermined locally or globally based on a broadcast scheme or in anysuitable manner. The NoC station 810 eventually receives the packet,determines that it is specified to process the packet, and, based onthat determination, transfers the packet into the local logic area ofthe point B. Thus, this technique represents a model of computation inwhich streaming data is appended with tags indicating the NoC stationswhich are to process the data (i.e., transfer the data into a locallogic area or perform some operation on the data other than simplyforwarding it to another NoC station). Each NoC station, upon receivingdata, determines whether it is specified to process the data. If so, theNoC station processes the data. Otherwise, the NoC station simplyforwards the data without processing it.

FIG. 9 depicts a schematic diagram of functionality associated with aNoC station 900 in accordance with an implementation. In one embodiment,the NoC station 900 accepts clocking from global clock signals 902, hasbidirectional links to each of the north, south, east and west neighborsvia links 904, 910, 906, and 912, respectively, and has a bidirectionallink 908 to the local FPGA logic fabric. In the illustrated example ofFIG. 9, the bidirectional link 908 is coupled to endpoint ports, whichmay correspond to where data enters the NoC topology from the locallogic fabric and/or leaves the NoC topology for the local logic fabric.

The functionality associated with FIG. 9 may apply for differentconfiguration of the NoC station, for example, whether the NoC stationis statically switched or implements dynamic packet routing. The use offour bidirectional links (i.e., north, south, east, and west) to otherNoC stations is exemplary. For example, some (or all) of the NoCstations in a given topology may use unidirectional links of a same ordifferent bit width or arrangement than the bidirectional links presentin the network. Further, some (or all) of the NoC stations in a giventopology may include fewer or more than one link to the local FPGA logicfabric. For example, zero links to the local FPGA fabric implies thatthe station acts only as a router but not a source or destination point,and more than one link implies that more than one stream of data couldenter the NoC station. These multiple streams could be arbitrated and/orotherwise multiplexed onto the network.

Further, some (or all) of the NoC stations in a given topology may omithorizontal links 906 and 912 to other NoC stations, thus providingvertical-only routing. Similarly, some (or all) of the NoC stations in agiven topology may omit vertical links 904 and 910 to other NoCstations, thus providing horizontal-only routing. Other topologies arealso possible.

In some embodiments, for example, in the case where the data ispacket-routed, the NoC station 900 is configured to access additionalconfiguration information (not shown in FIG. 9). For example, the NoCstation 900 may be configured to access an address of the NoCstation/block, to use selectors to choose from one or more clockresources, and/or to handle Quality-of-Service (QoS) requirements. TheNoC station is optionally provided, in some embodiments, with resourcessuch as buffering memories to store some packets such as when thenetwork is busy.

The QoS requirements may relate to any suitable performance parameter,such as, but not limited to, a required bit rate, latency, delay,jitter, packet dropping probability, data disposability, the priorityand importance of a packet to be transmitted, and/or bit error rate. TheQoS requirements may include any information related to the quality orperformance of data communication in the FPGA or the NoC, such as abuffer size of a memory of the NoC station, a data width of the NoCstation, and/or a store-and-forward policy of the NoC station.

A NoC station such as NoC station 900 of FIG. 9 may include a hard-IPportion and a soft-IP configurable portion. Thus, in order to configurea NoC, a mechanism may be provided for a designer to configure thesoft-IP portion of each of multiple NoC stations or nodes. The mechanismmay include a computer-aided design (CAD) tool. The configuration of thesoft-IP portion of the NoC station may be specified according to a“MegaFunction” or library function which allows instantiation of the NoCstation. A MegaFunction refers to one or more of a (1) user interface,(2) software, and (3) supporting implementation, to describe an abilityfor a user of a device to use one or more functionalities of the devicein a flexible, parameterized way. The supporting MegaFunctionimplementation may include supporting soft logic and/or hard logic. Theintervening MegaFunction software may determine how to implement theparameters supplied by the user, while running the MegaFunction userinterface. For example, the MegaFunction software may determine how theuser-supplied parameters get translated to changes in the soft logic,and/or to settings in the hard logic. In some embodiments, theMegaFunction implementation logic is generated by a graphical userinterface, variously referred to as “wizard”, “plug-in”, “MegaWizardPlug-In Manager” or similar terminology.

According to some aspects, the MegaFunction allows parameterizability onthe operation of the network. FIG. 10 illustrates a MegaFunction 1010for implementing a NoC station 1000 with parameterizable networkoperation according to an implementation. As depicted by illustrativeMegaFunction 1010, the MegaFunction can configure various aspects of theinternal operation of the network, for example, by specifying staticroutes or other routing decision (at 1012), setting a store-and-forwardpolicy (at 1014), specifying multiplexing schemes/settings (at 1016),and/or by setting any other desired operational parameters. TheMegaFunction 1010 may, for example, configure aspects of the internaloperation of the network by instantiating QoS flags and/or setting abuffer size of an integrated FIFO. The MegaFunction 1010 may outputRTL-level logic required to interface the hardened station/node of theNoC into the fabric, e.g., by instantiating the source and destinationregisters in the FPGA logic, setting the timing constraints of thepaths, and/or creating the required clock crossings. In oneimplementation, the MegaFunction 1010 may allow the NoC to operate at afixed high-speed clock rate, while letting the FPGA fabric run at auser-determined clock rate, which can be lower than the NoC high-speedclock rate.

According to some aspects, the MegaFunction may allow soft-IPconfigurability of the network. For example, the MegaFunction mayprovide an interface for soft logic, such as logic interfaces locatednear the FPGA fabric. The soft-logic interface may be used to configuredecision-making that was not envisioned or embedded in the hardenedimplementation of the device. FIG. 11 illustrates a MegaFunction 1110with such soft-logic interface functionality, implementing a NoC station1100 according to an implementation. The MegaFunction 1110 includes softrouting decision logic 1112 in communication with hardened multiplexingcircuitry 1114. The soft routing decision logic 1112 may be programmedwith any type of functionality by the designer after hardening of theNoC station 1100 or device. The hardened multiplexing circuitry 1114 maysend data in one or more direction as determined by soft routingdecision logic 1112. For example, soft routing decision logic 1112 mayhave decided or determined that the data from the left Link is to besent to the top Link. To accomplish this routing decision, soft routingdecision logic 1112 may send multiplexor settings to hardenedmultiplexing circuitry 1114 to effect that connection. For example,hardened multiplexing circuitry 1114 may be configured based on thereceived multiplexor settings to implement a target set of connections.

FIG. 16 is a flowchart illustrating a process 1600 for configuring auser-programmable soft-IP interface for a Network-On-Chip (NoC) stationof an integrated circuit. As a result, the soft-IP interface may supporta hard-IP interface of the NoC station. Process 1600 may be implementedin a NoC station similar to any of the NoC stations described herein.

At 1602, the soft-IP interface for the NoC station is instantiated via asoftware library function. The software library function may be providedthrough a MegaFunction, e.g., such as any of the MegaFunction blocksillustrated in FIGS. 10, 11, and 12.

At 1604, at least one Quality-of-Service (QoS) parameter of the NoCstation is specified via software. In one implementation, the at leastone QoS parameter specifies a buffer size of a memory of the NoC stationand/or a store-and-forward policy of the NoC station. The software mayoutput RTL code for interfacing the soft-IP interface of the NoC stationto the hard-IP interface of the NoC station.

At 1606, the soft-IP interface is configured based on the at least oneQoS parameter from 1604 to provide functionality for the NoC station.The functionality may not otherwise be provided by the hard-IPinterface.

In one implementation of 1606, the at least one QoS parameter specifiesa data width of the NoC station, and the soft-IP interface provides dataadjustment/adaptation functionality, such as to break data greater thanthe width of the NoC into multiple transactions or to pad undersizeddata to the datawidth of the NoC. For example, the soft-IP interface maybe set up to provide segmentation of data received at the NoC stationinto smaller units of data for processing by the NoC station, if thedata is of a width greater than a specified data width. The soft-IPinterface may be set up to provide padding of the data received at theNoC station so that the padded data may be processed by the NoC station,if the data is of a width less than the specified data width.

In one implementation of 1606, the functionality provided by the soft-IPincludes regulating streams of data based, at least in part, on one ormore QoS constraints for each respective stream of data. The one or moreQoS constraints for a given stream of data may be specified, e.g., at1604, based on an available bandwidth parameter. The regulating may bedone by multiplexing the streams of data, interleaving the streams ofdata, and/or any other suitable way. For example, the MegaFunctionimplementation can be configured to multiplex multiple transactionstreams, including arbitration logic, interleaving, rate-matching andbandwidth or QoS allocation. The MegaFunction logic 1110 may in somecases be configured by adding logic for either primitive flow-control(e.g., acknowledgment ACK signals) or more complicated standardprotocols such as high-speed bus interfaces.

In various implementations, the datawidth of the NoC may be set as oneof multiple settings, for example, to either a data-only setting or adata-plus-metadata setting. In one illustrative example, NoC mayimplement a logic 48 bus appended with 16 bits of metadata, such asaddress/control flags, in a 64-bit physically-allocated datapath. Adesigner may generate the logic himself or herself using theconfigurable FPGA fabric. Alternatively or in addition, the MegaFunctionmay add such logic for configuring allocation of datawidth.

According to some aspects, the MegaFunction implementation may beallocated separate memory resources, such as a separatestore-and-forward memory component. For example, the MegaFunction caninstantiate both the NoC station and a path to a nearby embedded memoryblock to act as a receiver buffer for traffic burst from/to the localarea over the network.

FIG. 12 depicts an illustrative MegaFunction 1210 with such embeddedmemory resources, implementing a NoC station 1200 according to animplementation. MegaFunction 1210 includes an embedded memory block1212, which may be an FPGA fabric RAM component in some implementations.

In some implementations, the hardened multiplexing circuitry 1214 mayhave customizable multiplexor settings and may operate similarly tohardened multiplexing circuitry 1114 of FIG. 11. For example, thehardened multiplexing circuitry 1214 may be configured using softrouting decision logic to effect different sets of connections, e.g.,depending on a user-defined design. In some embodiments, the hardenedmultiplexing circuitry 1214 may have fixed multiplexor settings and mayimplement the same set of connections without possibility of adjustment.

Memory block 1212 may implement rate-matching functionality. Forexample, memory block 1212 may store data that is arriving at a quickerrate than the data is exiting. Alternatively or in addition, memoryblock 1212 may store data when the destination is busy and/orunavailable. The rate-matching functionality may be implemented whetheror not the MegaFunction implementation includes soft routing decisionlogic. For example, the soft routing decision logic might have decidedto change the data connections, which might cause the data connectionsto overlap in time. In this case, for example, some of the data beingrouted may need to be stored in memory block 1212 during the overlap.

Some programmable devices include redundant regions with additional rowsor columns of resources in a specified region which can be turned off torecover fabrication yield. In some embodiments, the pitch of NoC regionsis tied to the redundancy regions. For example, a device may beconstructed such that there are N physical rows of logic but where onerow, denoted the redundant or spare row, is present only for repair of afailed row, leaving N−1 logical rows for use. When the device is tested,each row is tested and then one “broken” row is marked, using aprogrammable fuse or comparable technology, as unused. If some row failsthe test, the spare row is programmed to take its place. If no rowfails, then the spare row is marked as unused. In some devices, thedevice is additionally divided into multiple repair regions orsuper-rows. For example, a device may have M vertically stackedquadrants of the aforementioned N-row device. Setting exemplarily N to33 and M to 4, this would yield a device with M*N=132 physical rows,M*N−1=128 logical rows, and for which one row in any of the M regionscan be independently marked as unused. In some implementations of suchdevices, the boundaries of redundant regions act as a natural break tothe programmable logic fabric and are therefore a natural location forblocks that cannot be tiled at the individual row and/or column level.When such boundaries exist due to redundancy or similar provision, theNoC regions may be implemented using these locations.

FIG. 13 depicts a manner in which NoC stations may be placed in an FPGAdevice 1300 with a vertically tiled organization according to anillustrative implementation. In the illustrative example of FIG. 13, NoCstations are placed in an FPGA device 1300 with 16 regions, labeled Athrough P. FPGA device 1300 has 4 super-rows {ABCD, EFGH, IJKL, MNOP}.FPGA device 1300 additionally has NoC columns 1322, 1324, 1326, 1328,and 1330, placed in between super-columns {AEIM, BFJN, LGKO, DHLP},respectively, to physically hold the NoC. For example, NoC logicportions 1302, 1304, 1306, 1308, and 1310 of one or more NoC stationsare placed along the NoC columns 1322, 1324, 1326, 1328, and 1330 of theFPGA device 1300. Zoomed view 1390 of the super-row EFGH shows theregular rows 1392 and the spare row 1394 inside this super-row EFGH, andthe location 1396 of the NOC around this super-row.

The arrangement illustrated in FIG. 13 may have several advantages.First, this arrangement may eliminate the need for redundancy-steeringlogic as part of the NoC station and wiring. Rather, the logic isdistributed according to the redundant regions. Second, this arrangementtends to provide a uniform absolute distance between NoC stations, sincethe redundancy regions are generally tied to raw Silicon areas due tothe relationship between area and yield defects. As a result, thearrangement of FIG. 13 may allow for appropriate pipelining and constantnetwork operating speeds across a range of device sizes.

For example, in a family of devices utilizing arrangements similar tothat of FIG. 13, the NoC can be provisioned as to be efficientlyscalable. For example, FIG. 14 depicts several family variants in whichNoC components scale to different device sizes while retaining commonproperties of a base network in accordance with an arrangement. Inparticular, device 1410 includes 16 device regions, device 1420 includesnine device regions, and device 1430 includes four device regions. Eachof the devices 1410, 1420, and 1430 stores logic of NoC stations intheir respective vertical columns. By pipelining each of these devices,a constant network speed is achieved across family members (i.e., thedevices 1410, 1420, and 1430) even though the latency in clock cyclesmay grow with the size of the devices 1410, 1420, and 1430. A sourcedesign embedded in such an architecture would thus be re-targetable todifferent device family members as long as adequate care was taken inthe architecture of the source design for latency-variablecommunication.

To facilitate practical use of NoC technology in a programmable logic orother devices, the end-product is typically verified through simulationand other means. In one embodiment, the higher-level tools with whichthe NoC is instantiated may also provide auto-generated simulationmodels. Examples of such models include Verilog/VHDL, System Verilog orBluespec and/or transactional-level modes in SystemC or other forms ofexpression.

Several benefits of fast-moving switched paths such as the ones enabledby the NoC systems and methods described herein involve connecting toexternal components. In some embodiments, the NoC is specifically tiedto the operation of the two primary I/O systems: a memory system such asthrough a DDR external memory interface (EMIF), and a transceiver systemsuch as through a high-speed serial interface (HSSI) interface orthrough a PCS (physical code sublayer) block which terminates a physicalprotocol. For programmable devices with ASIC or other embedded logiccomponents, similar connections tying those system blocks to the NoC arealso envisioned.

The NoC functionality may provide additional value to the applicationsimplemented on a device by arbitrating for these fixed resources betweendifferent requestors. For example, if two (or more) portions of the userdesign involve access to a single bank of DDR memory, both can placetheir requests onto the hardened NoC and allow the NoC's arbitrationmechanism to determine who gets access to the memory. This may lead toreduction of the user logic counts, because there is no need for theuser to configure arbitration logic in this way. This may also lead tofrequency improvement due to the hardened and distributed arbitrationmechanism in place.

FIG. 15 illustrates such a case. In particular, FIG. 15 depicts a sampleFPGA floorplan 1500 with hard-IP components, such as hard-IP blocks1510, 1512, 1514, and 1516 and hard-IP interface stations 1523 and 1525.The hard-IP blocks 1510, 1512, 1514, and 1516 may be implemented ashardened controllers and/or physical interfaces to inputs and/or outputsof the device. The hard-IP blocks 1510, 1512, 1514, and 1516 aredirectly interfaced with NoC stations such as NOC stations 1520, 1522,1524, 1526, 1528, and 1530. As illustrated in FIG. 15, the NoC isdirectly interfaced with a communication layer of the FPGA, in thisexample, the PCS of the high-speed serial interface on the right andleft through interface stations 1523 and 1525, respectively.

Examples of the FPGA resources and I/O blocks with which the hard-IPblocks 1510, 1512, 1514, and 1516 or interface stations 1523 or 1525 mayinterface include logic fabric 1552, DSP blocks 1554, internal memoryblocks 1556, clocking blocks 1558 (e.g., fractional PLLs), I/O hard-IPblocks 1560 (e.g., implementing embedded industry protocols such as PCIExpress), hard-IP transceiver blocks 1562 (e.g., implementing physicallayer protocols such as PCS) and high-speed serial transceiver blocks1564. These resources are included for the purpose of illustration only,not limitation, and it will be understood that the hard-IP components ofFIG. 15 may interface with other types of resources without departingfrom the scope of this disclosure.

The hardened components of FIG. 15 may function in all or in part as astation on the network, but could also have additional functionality.For example, the PCS interface stations could perform a dedicatedfunction such as framing Ethernet packets and steering payload data andheader data to different destinations in the device, or could appendmetadata as described earlier for multicast/broadcast or schedulingdestinations and/or “worker tasks” on the device to read specific data.

The above use of the term “FPGA” is exemplary, and should be taken toinclude a multitude of integrated circuits, including, but not limitedto, commercial FPGA devices, complex programmable logic device (CPLD)devices, configurable application-specific integrated circuit (ASSP)devices, configurable digital signal processing (DSP) and graphicsprocessing unit (GPU) devices, hybrid application-specific integratedcircuit (ASIC), programmable devices or devices which are described asASICs with programmable logic cores or programmable logic devices withembedded ASIC or ASSP cores.

It will be apparent to one of ordinary skill in the art, based on thedisclosure and teachings herein, that aspects of the disclosedtechniques, as described above, may be implemented in many differentforms of software, firmware, and hardware in the implementationsillustrated in the figures. The actual software code or specializedhardware used to implement aspects consistent with the principles of thedisclosed techniques are not limiting. Thus, the operation and behaviorof the aspects of the disclosed techniques were described withoutreference to the specific software code—it being understood that one ofordinary skill in the art would be able to design software and hardwareto implement the aspects based on the description herein.

What is claimed is:
 1. An integrated circuit system, comprising: aprogrammable logic circuitry; an array of programmable hardenedcircuitry stations configurable to receive data from or send the data tothe programmable logic circuitry, wherein the array of programmablehardened circuitry stations comprises: a first hardened circuitrystation; a second hardened circuitry station; a third hardened circuitrystation; a first direct communication path between the first hardenedcircuitry station and the second hardened circuitry station; and asecond direct communication path between the second hardened circuitrystation and the third hardened circuitry station; wherein the firsthardened circuitry station is configurable to route the data to thethird hardened circuitry station through the second hardened circuitrystation, wherein the third hardened circuitry station is configurable toprocess the data, and wherein the second hardened circuitry station isconfigurable not to process the data
 2. The integrated circuit system ofclaim 1, wherein the third hardened circuitry station is configurable tosend the data to the programmable logic circuitry.
 3. The integratedcircuit system of claim 1, wherein the array of programmable hardenedcircuitry stations communicates data to or from the programmable logiccircuitry in a bi-directional manner.
 4. The integrated circuit systemof claim 1, wherein the array of programmable hardened circuitrystations is clocked at a different rate than the programmable logiccircuitry.
 5. The integrated circuit system of claim 1, wherein thearray of programmable hardened circuitry stations is communicativelycoupled to the programmable logic circuitry via multiple nodes, whereinthe array of programmable hardened circuitry stations comprises an arrayof programmable hardened intellectual property (IP) stations.
 6. Theintegrated circuit system of claim 1, wherein the array of programmablehardened circuitry stations is clocked at a higher rate than theprogrammable logic circuitry.
 7. The integrated circuit system of claim1, wherein the array of programmable hardened circuitry stations isprogrammable to perform signal processing on data received from theprogrammable logic circuitry.
 8. The integrated circuit system of claim1, comprising a plurality of interconnections that route data betweenprogrammable hardened circuitry stations of the array of programmablehardened circuitry stations.
 9. The integrated circuit system of claim8, wherein the plurality of interconnections communicatively connect thefirst programmable hardened circuitry station to the second programmablehardened circuitry station in a unidirectional manner.
 10. Theintegrated circuit system of claim 8, wherein the plurality ofinterconnections are configurable based on software-generated code. 11.The integrated circuit system of claim 1, wherein the array ofprogrammable hardened circuitry stations is communicatively coupled todistributed memory, wherein each programmable hardened circuitry stationof the array of programmable hardened circuitry stations is associatedwith at least one memory block of the distributed memory.
 12. Anintegrated circuit system, comprising: programmable logic circuitry; amemory interface configurable to communicatively couple to a double datarate (DDR) memory; input/output (I/O) circuitry; and a network-on-chip(NOC) comprising one or more clock-crossing buffers associated with aplurality of network-on-chip (NOC) nodes, wherein the NOC substantiallyspans a north-south height and east-west width of the integrated circuitsystem, and wherein the NOC is configurable to communicatively couplethe programmable logic circuitry, the I/O circuitry, and the memoryinterface in a packetized and bi-directional manner.
 13. The integratedcircuit system of claim 12, wherein the NOC comprises vertical NOCcolumns configurable to connect between logic regions of theprogrammable logic circuitry.
 14. The integrated circuit system of claim12, wherein the one or more clock-crossing buffers facilitatescommunication between the NOC and the programmable logic circuitry bybuffering communication between a first clock rate of the NOC and asecond clock rate of the programmable logic circuitry.
 15. Theintegrated circuit system of claim 12, wherein the NOC iscommunicatively coupled to the programmable logic circuitry via one ormore NOC nodes.
 16. The integrated circuit system of claim 12, whereinthe NOC communicates data to the programmable logic circuitry and theI/O circuitry in a unidirectional manner.
 17. The integrated circuitsystem of claim 12, wherein the NOC is in a separate region of theintegrated circuit system than the programmable logic circuitry.
 18. Theintegrated circuit system of claim 12, wherein the NOC is addressable.19. A method for operating an integrated circuit system comprising:configuring field-programmable gate array (FPGA) programmable logiccircuitry; processing data using a first plurality of hardened circuitrynodes; and communicating data from the FPGA programmable logic circuitryto at least one hardened circuitry node of the first plurality ofhardened circuitry nodes via a second plurality of hardened circuitrynodes configurable to operate as bi-directional NOC nodes, wherein thesecond plurality of hardened circuitry nodes do not process the data.20. The method of claim 19, wherein communicating the data from the FPGAprogrammable logic circuitry to the at least one hardened circuitry nodecomprises transferring the data in a packetized manner.